Density Micro-Clustering Algorithms on Data Streams: A Review
نویسندگان
چکیده
Data streams are massive, fast-changing, and infinite. Applications of data streams can vary from critical scientific and astronomical applications to important business and financial ones. They need algorithms to make a single pass with limited time and memory. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non-stopping data streams. Clustering is a prominent task in mining data streams, which group similar objects in a cluster. Several clustering algorithms have been introduced in recent years for data streams that are based on distance, so they can find only spherical shapes. Therefore, density-based clustering algorithms are adopted for data streams with ability for not only discovering the arbitrary shape clusters, but also for providing protection against the outliers. In fact, in density-based clustering algorithms, dense areas of objects in the data space are considered as clusters, which are segregated by low density area (noise). However, in the clustering data streams, due to certain characteristics, it is impossible to record all the data. Micro-clusters are a technique in stream clustering that maintains the compact information about the data objects in data streams. Microcluster is a temporal extension of the cluster feature, which compresses the data effectively. In this paper, we intend to review the outstanding density-based clustering algorithms on data streams using micro-clusters. We will explore algorithm characteristics and analyze their merits and limitations.
منابع مشابه
بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائهشده برای آن
Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...
متن کاملDensity-Based Clustering over an Evolving Data Stream with Noise
Clustering is an important task in mining evolving data streams. Beside the limited memory and one-pass constraints, the nature of evolving data streams implies the following requirements for stream clustering: no assumption on the number of clusters, discovery of clusters with arbitrary shape and ability to handle outliers. While a lot of clustering algorithms for data streams have been propos...
متن کاملSOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clus...
متن کاملAnytime Concurrent Clustering of Multiple Streams with an Indexing Tree
With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a s...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011